VoLUME 66 WHOLE No. 337 
NUMBER 5 1952 


» Psychological Monographs: 
General and Applied 

3 Combining the Applied Psychology Monographs and the Archives of Psychology 
: with the Psychological Monographs 


.» HERBERT S. CONRAD, Editor 


The Validity of Personality-Trait 


Ratings Based on Projective 
Techniques 
; HENRY SAMUELS 


Veterans Administration Center 
Columbus, Ohio 


Based on a dissertation submitted in partial fulfillment of the requirements for the degree 
of Doctor of Philosophy at the University of Michigan 


Accepted for publication July 17, 1951 


Price $1.00 


Published by 


THE AMERICAN PSYCHOLOGICAL ASSOCIATION, INC. 
1515 MASSACHUSETTS AVE., N.W., WASHINGTON 5, D.C. 


= 
: 
| 
i 
| 
| 
i 


CopyYRIGHT, 1952, BY THE 
AMERICAN PSYCHOLOGICAL ASSOCIATION, INC. 


| 
i 
Ber 
© 
‘ 


Chapter I. Introduction 


Chapter II. Procedures and Methods 


Chapter IIT. Results 


Chapter IV. 


Chapter V. 


Bibliography 


ONSIDERABLE importance has been at- 
C tached to projective techniques by 
clinical psychologists and professional 
personnel in allied fields. Such devices 
have been used in psychiatric diagnosis, 
academic and industrial selection, edu- 
cational and vocational guidance, re- 
search studies of personality develop- 
ment, and in psychotherapy. One of the 
outstanding problems with respect to 
such techniques is their validity in their 
various applications. 


‘I wish to acknowledge my indebtedness to all 
the people who helped to make this study possi- 
ble. I am especially grateful to Professor E. Lo- 
well Kelly for his counsel, instruction, and labor, 
without stint. Such value as may be derived from 
the study must, for the most part, be attributed 
to him. I am indebted to Dr. Donald G. Marquis, 
Dr. George A. Satter, Dr. Max L. Hutt, and Dr. 
Wm. Clark Trow for their suggestions and 
criticisms, and to Dr. Donald W. Fiske and Dr. 
Ernest C. Tupes for their willing help. My wife, 
Helen, was, as always, inspiring and stimulating, 
and, in addition, labored long and arduously 
with the data, Caroline Weichlein and Phyllis E. 
Moore were most zealous with the secretarial 
chores, 

The opinions expressed herein are the author's 
and do not necessarily reflect the views of the 
Veterans Administration. 


THE VALIDITY OF PERSONALITY-TRAIT RATINGS 
BASED ON PROJECTIVE TECHNIQUES 


Other Factors Related to Validity 


Summary and Conclusions 


INTRODUCTION? 


A review of the literature on projec- 


‘tion techniques emphasizes the need for 


determining the extent to which projec- 
tive techniques agree with independent 
criteria in the description of personality. 
We may have different kinds of validi- 
ties depending upon the function pro- 
jective techniques are asked to serve. 

The purpose of the present investiga- 
tion is to continue the study of the valid- 
ity of projective techniques by consid- 
ering the following questions: 


(1) How well are projective clinicians 
who use the same projective meth- 
od able to describe personality? 
Are there differences among clini- 
cians in this ability? 

(2) Are there differences in the extent 
to which personality is correctly 
described which are related to the 
kind of projective method used? 

(3) Are there differences among attri- 
butes of personality which make 
for differences in the degree to 
which they can be correctly de- 
scribed? 


3 
Cuapter I 
1 


HE present investigation was done as 
4 part of the research project on the 
Selection of Clinical Psychologists spon- 
sored by the Veterans Administration 
under a contract with the University of 
Michigan, under the direction of Profes- 
sor E. Lowell Kelly (3, 4). 


A. THE ASSESSMENT PROGRAM 


During the Summer of 1947, 140 stu- 
dents who had been selected by univer- 
sities for first-year positions in the Veter- 
ans Administration clinical psychology 
training program came to Ann Arbor 
in “classes” of 24 per week to be as- 
sessed. During the assessment week a stu- 
dent completed a battery of paper and 
pencil objective tests, and had adminis- 
tered to him in group form the The- 
matic Apperception Test (10 cards) and 
the project’s form of a sentence comple- 
tion test. During the first day or two the 
student was administered the Rorschach 
and the Bender-Gestalt individually and 
at separate sessions. Each student was in- 
terviewed twice (6), filled out a 131-item 
Biographical Inventory, wrote an autobi- 
ography, was put through a variety of 
situations, and filled out a sociometric 
questionnaire. 

The subjects, on whom the ratings in- 
vestigated in this study were made, con- 
sisted of 128 male college graduates who 
had been accepted by various universi- 
ties for graduate training in clinical psy- 
chology. The twelve women who were 
part of the total assessed group have 
been omitted from this study. 

Because they are crucial to the present 
study, a detailed discussion is given to 
the rating scales, the projective ratings, 
and the criterion ratings. 


CHAPTER II 
PROCEDURES AND METHODS 


B. THE RATING SCALES 


The rating scales cover 42 variables di- 
vided into three sections which are 
given the letter designations A, B, and C: 
Scale A, consisting of 22 variables, was 
designed to describe more overt or phe. 
notypical dimensions of personality and 
was adapted from Cattell’s (1) factorial 
studies, with modifications made on the 
basis of experience with the scale in 
earlier pilot assessments. In the present 
study 5 of the 22 Scale A variables are 
used. These 5 were selected on the basis 
of a factorial analysis by Fiske (2). Each 
of these 5 selected Scale A variables has 
highest factor loadings for Fiske’s five 
factors which are nearly orthogonal to 
each other. These 5 selected variables 
are: 


# 4 Depressed—Cheerful 

#17 Conscientious—Not Conscientious 

#18 Imaginative—Unimaginative 

#21 Dependent Minded-Independent Minded 

#22 Limited overt emotional expression- 
Marked overt emotional expression 


The definitions of Scale A variables 
were designed to depict these variables as 
sharply as possible as surface behavior, 
i.e., how the individual would appear di- 
rectly to the observer. 

The g Scale B variables were designed 
to provide opportunity for judgment of 
the more covert, genotypical, dynamic, 
or interpretive aspects of personality. 
Despite the known intercorrelations be 
tween the Scale B variables (from —.57 
to .7g) all g have been used in this study, 
since it is in the description of thes 
aspects of personality that projective 


*Complete definitions of all 42 personality 
traits may be found in a doctoral dissertation b) 
Samuels (5, Appendix A). 


te 
m 
al 
st 
ti 
4 
O 
T 
Nn 
r 
I 
t 
I 
| 
2 


les di- 
1 are 
nd C: 
Was 
phe. 
y and 
‘torial 
mn. the 
le in 
resent 
es are 
basis 
Each 
es has 
s five 
1al to 
lables 


‘jables 
bles as 
\avior, 
ear di- 


signed 
ent of 
namic, 
nality. 
ns be: 
study, 

these 
jective 
sonality 
tion by 


techniques are commonly thought to be 
most useful. These g Scale B variables 
are: 

#23 Social Adjustment 

#24 Appropriateness of Emotional Expression 

#25 Characteristic Intensity of Inner Emo- 

tional Tension 

#26 Sexual Adjustment 

#27 Motivation for Professional Status 

#28 Motivation for Scientific Understanding 

of People 

#29 Insight into Others 

#30 Insight into Himself 

#31 Quality of Intellectual Accomplishments 

Scale C included 11 variables on which 
staff members were asked to make predic- 
tive judgment. Of these, only number 
42, “Over-all Suitability for Clinical Psy- 
chology,” is used here, since this was the 
variable toward which the entire assess- 
ment program was oriented. Through- 
out the remainder of the discussion Vari- 
able 42 is treated as though it were a 
Scale B variable for simplification in sta- 
tistical summary. This assumes that the 
rating on “Over-all Suitability” repre- 
sents an inference about a covert, al- 
though perhaps extremely complex, di- 
mension of personality and is essentially 
similar to Scale B variables. 

All ratings were made on an 8-point 
scale with a theoretical distribution of 
ratings of 3, 7, 15, 25, 25, 15, 7, and 3 
per cent for points from 1 to 8 respec- 
tively. Raters were instructed to use a 
reference population of first-year clini- 
cal psychology graduate students in 
American universities. 


C. Tue Projective RATINGS 


As was stated above, during the early 
part of the week each student was ex- 
amined with the Rorschach and the 
Bender-Gestalt individually. Each rec- 
ord was scored, interpreted, summarized, 
and the applicant rated on the scales 
described above by the test administra- 


PERSONALITY-TRAIT RATINGS BASED ON PROJECTIVE TECHNIQUES 


3 


tor, without other knowledge about the 
individual than could be gained from 
the particular projective technique and 
the personal contact involved during the 
testing period. The Thematic Appercep- 
tion Test and the project form of a 
sentence-completion test were adminis- 
tered as group tests. Each subject was 
then rated on each variable of Scale A 
and Scale B by four different staff mem- 
bers, each rater basing his ratings on a 
different projective technique. A single 
staff member was concerned with only 
one projective method per subject and 
was instructed to keep himself in ignor- 
ance of the findings of the other staff 
members for that subject. After each of 
the four projective techniques had been 
independently analyzed and_ subjects 
rated on the basis of each, the Rorschach 
analyst, serving in a new role of “pro- 
jective integrator,” made another set of 
ratings on each variable of Scale B based 
on all the projective data, with the ex- 
ception of;the actual ratings which had 
been independently made on the basis 
of each projective technique. 

The number of students seen by a 
given clinician was not the same for a 
given technique, nor was the number of 
students seen by a given clinician the 
same, necessarily, for two or more tech- 
niques. The number of students per 
rater per technique is shown in Table 1. 

A given rater may have analyzed more 
Rorschachs than another, fewer TAT’s 
than any or all of the others, and some 
other number of SC’s. The differences in 
N between the two Scales, A and B, are 
attributable to the fact that cases which 
were integrated by the “projective. inte- 
grator” were not rated on Scale A by the 
“projective integrator” nor by the other 
projective-technique raters. The projec- 
tive raters used in this study are fewer 


= 

4 

4 

% 

: 

Minded 
ession- 
1 


HENRY SAMUELS 


TABLE 1 
NUMBER OF STUDENTS RATED BY Eacu RATER ON Four PROJECTIVE TECHNIQUES 


Technique 


Rorschach II II 

TAT 7 15 
Sc 4 8 14 4 
BG 


Rorschach 18 23 22 22 
TAT 22 22 14 28 
SC 16 13 37 4 


BG 


20 20 
4 25 
26 6 


** With an N of two, r is .00 or 1.00. 


in number than the total number of 
projectivists on the assessment staff. Only 
six are used, each of whom rated enough 
students via Rorschach, Thematic Ap- 
perception Test, and sentence comple- 
tion to make comparisons among them 
possible. Four other projective raters 
were concerned only with the Bender- 
Gestalt. The latter four, coded by letters 
R, Q, M, and P, were graduate students 
at the University of Michigan, at the 
third-year level of training in the VA 
training program for clinical psychol- 
ogists, and had been trained in the clini- 
cal use of the Bender-Gestalt by the same 
instructor. The other projectivists are 
coded W, O, L, F, T, and G. They were 
initially selected on the basis of profes- 
sional competence, and the choice of an- 
other group of projective-analysis raters 
similarly selected would probably not 
have resulted in any better interpretation 
of the projective tests. The sampling of 
techniques and the sampling of variables 
are both regarded as adequate. Raters 
were given full freedom with respect to 
scoring, interpretation, and report writ- 
ing. The only formal imposition placed 


* With an N of three, r is either indeterminate or fluctuates widely. 


upon them was the conversion of impres. 
sions, deductions, inferences, or conclu- 
sions into single numbers on each of the 
rated variables. 

Although students were assigned ran- 
domly for projective examination, it was 
possible that the relatively small number 
of students seen by a given projectivist 
using one of the projective devices might 
have represented a different student 
population from that seen by another 
projective rater using the same projec. 
tive instrument. That is, differences be- 
tween projectivists’ ratings might be 
found which might be attributable to dif. 
ferences in the students examined and 
not to differences in the projective tech: 
niques. As an additional check on the 
random assignment of students for pro- 
jective testing the following test was 
made. Distributions of the FinP ratings’ 
on students on Variable 42, “Over-all 
Suitability,” were made separately for 
each of the four projective techniques 
used in this study. For a given technique 


* Final Pooled Ratings. PI will be used as a0 
abbreviation for the Projective Integrator and the 
Projective Integration. 


=~ Fo 


ul 
Rater Ir 
th 
5 9 ol 
3° 
13 
19 oe 13 di 
Scale B 
d 
32 29 33 33 si 
re 
el 
d 
A 
te 
r 
fi 
fi 
j 
t 
t 


npres- 
onclu- 
of the 


d ran- 
it was 
umber 
Ctivist 
might 
tudent 
nother 
projec: 
res be: 
ht be 
to dif 
d and 
e tech 
on the 
st was 
atings’ 
ver-all 
ly for 
niques 
inique 


d as an 
and the 


these ratings for students were distrib- 
uted for each of the projective raters. 
In this way it was possible to compare 
the mean FinP ratings on Variable 42 
| with respect to the student population 
seen by each of the projectivists for each 
of the projective techniques. Epsilon 
square was computed and no significant 
differences were found among the mean 
ratings for students assigned to different 
projective raters using the same tech- 
nique. Since the students had been ran- 
domly assigned for projective testing and 
since no differences were found between 
students for Variable 42, there was no 
reason to believe that systematic differ- 
ences would be found between students 
on other variables, and no other tests 
were made. 


D. THe Crrirerion RATINGS 


In the seven-day period of each stu- 
dent’s assessment, three staff members 
rated the student at frequent intervals. 
A given staff member would receive cer- 
tain data on a student and rate, then 
receive other data and rate again, having 
filed away his previous ratings, and con- 
tinue to make independent ratings based 
on various kinds of data. The first team 
conference (Preliminary Pooling Con- 
ference) produced a single set of ratings 
which represented a combination of the 
judgments of the staff team based on 
these materials: objective and projective 
tests, credentials, autobiography and Bi- 


tion obtained by way of the interview 
(6). At the final pooling conference, all 
previous material and all previous rat- 
ings were made available to the staff 
team. Also available at this time were 
the student’s self-ratings, ratings of each 
student by the other three members of 
the student team, the separate ratings 


PERSONALITY-TRAIT RATINGS BASED ON PROJECTIVE TECHNIQUES 


ographical Inventory, and the informa- 


5 


based on projective methods, and the 
ratings made by a team of staff members 
who had observed the student in situa- 
tion tests with no other knowledge about 
him. A separate final pooled rating was 
made on each variable of Scale A and of 
Scale B. These ratings are regarded as 
the best available measures of each of the 
42 personality traits for each subject as- 
sessed, and are the criterion measures of 
this study, They are designated FinP. 

The FinP ratings are the most com- 
prehensive and inclusive ratings to come 
out of the entire week’s assessment. 
These ratings were made by different 
combinations of three staff members. 
The staff were initially selected on the 
basis of professional competence. They 
had had opportunity to study the stu- 
dent in a wide variety of ways, and had 
had the benefit of judgments of many 
other skilled clinicians at various points. 
Although they are admittedly fallible, 
there can be little doubt that these FinP 
raings are as valid criterion measures of 
these variables as are obtainable at the 
present time from skilled clinicians using 
present techniques in an assessment situ- 
ation. 

The FinP ratings are composite cri- 
teria consisting of the pooled judgments 
of three staff members who had available 
to them the results of the tests and pro- 
cedures indicated above. A separate FinP 
rating was made for each variable of 
Scale A and Scale B. Since staff members 
were free to arrive at their final judg- 
ments as they chose, it would be difficult 
to say whether FinP ratings emphasized 
abilities, capacities, past achievement, or 
personality characteristics. Of interest in 
this connection is “. . . the fact that as- 
sessment staff members tended to be uni- 
formly of the opinion that the interview 
contributed most to their ‘understanding 


| 


6 


of the case,’ followed by either the pro- 
jective tests or autobiography” (4, p. 
404). 

Two problems must be considered 
with respect to the relationship between 
the ratings based on projective tech- 
niques and the FinP ratings. First, how 
much influence did any or all of the 
projective techniques have on the cri- 
terion measures? Second, are there differ- 
ences in the criterion ratings when the 
Projective Integrator is and is not a 
member of the criterion team? 

With respect to the first problem of 
the influence of projective techniques on 
criterion measures, a quantitative answer 
cannot be given. The projective tech- 
nique protocols and the ratings based on 
them were available to the final pooling 
team. This constitutes a_ theoretical 
defect in the design of this study which 
was a sacrifice which had to be made in 
consideration of other aspects of the 
over-all project design. It would have 
been preferable, of course, if the ratings 
based on each projective technique could 
have been checked against a criterion 
based on all available data except that 
technique. This deficiency with respect 
to experimental independence would 
have the effect of spuriously raising the 
correlations between the ratings based 
on projective techniques and the cri- 
terion. The possible effect of dominance 
in criterion ratings by any one criterion 
team member tended to be cancelled by 
the rotation of staff members from week 
to week in such a way that for each of 
the student classes the criterion teams 
were differently constituted. This does 
not eliminate the possibility of what 
might be called group bias with respect 
to projective methods. For example, 


HENRY SAMUELS 


many of the staff members, not them. 
selves projectivists, spoke very respect 
fully of the value of the Rorschach and 
may have given careful attention to the 
Rorschach ratings in arriving at the ci. 
terion ratings. The possibility exists that 
another staff, or the same staff assessing 
at another time, might have another at. 
titude toward projective methods and 
arrive at a different set of criterion ratings 
for the same subjects. Since there is small 
likelihood of another assessment de. 
signed to measure the reliability of ai. 
terion ratings by the test-retest tech. 
nique, these obtained criterion measures 
must be regarded as the only ones avail. 
able, and therefore, in a practical sense, 
as universe (not sample) measures. 

The problem of the effect of the Pro 
jective Integrator as a criterion team 
member on the criterion team ratings 
can be evaluated. The Projective Inte 
grator was a member of the criterion 
team for half the number of poolings. 
Since the Projective Integrator was also 
the Rorschach analyst, it might be hy- 
pothesized that if he exerted a constant 
influence on the criterion ratings in the 
direction of his Rorschach ratings, then 
the correlations between ratings based on 
Rorschach and criterion ratings would 
be higher when the Projector Integrator 
was a member of the criterion team than 
they would when he was not. A compati: 
son of correlations between Rorschach 
ratings and FinP ratings when the Pro 
jective Integrator was and was not 3 
member of the criterion team found 
none of the differences between correla 
tion coefficients under these conditions 
to be significant at the .o5 level, no! 
was the direction of difference consistent. 


| 

| r 

t 
a 
( 
( 

\ 


then. 
espect- 
ch and 
to the 
he cri- 
ts that 
SEssing 
her at. 
is and 
ratings 
small 
nt de. 
of cri- 
tech. 
Pasures 
avail: 
sense, 


1e Pro- 
team 
ratings 
e Inte. 
iterion 
olings. 
as also 
be hy 
ynstant 
in the 
s, then 
ised on 
would 
egratol 
m than 
pati: 
‘schach 
1e Pro- 
not a 
found 
orrela: 
ditions 
2], nor 
sistent. 


A. THE PROJECTIVE TECHNIQUES 


1. Validities of ratings based on 
projective techniques 


© DETERMINE the extent to which 
based on projective tech- 
niques correlated with FinP ratings, the 
ratings made by the six raters (four in 
the case of the Bender-Gestalt) were en- 
tered into a single scatter plot for each 
technique and variable. It was found 
that something other than “chance” 
operated to generate statistically signifi- 
cant correlations, since, for Scale A vari- 
ables, only 1 correlation in 20 was ex- 
pected to be significant at the .o5 level by 
chance and we find g in 20 significant at 
this level. With respect to Scale B, where 
jo correlations were computed, we would 
expect 2 to reach the .o5 level and we 
find 30 of the 40 correlations are signifi- 
cant at this level. In addition it was 
noted that only 2 of the 60 coefficients 
were negative. These two findings, the 
relatively large number of statistically 
significant correlations and the high pro- 
portion of positive correlations, suggest 
that the ratings based on projective tech- 
niques have some validity. However, 
these results are, to some extent, spurious 
since, in addition to the fact that the 
criterion measures by the six raters were 
not completely independent, the inter- 


CHAPTER III 


RESULTS 


TABLE 2 


MEDIANS OF CORRELATIONS BETWEEN RATINGS BASED ON EACH OF Four PROJECTIVE 
TECHNIQUES AND FINAL PooLep RATINGS FOR SCALE A AND SCALE B VARIABLES 


correlations among Scale B variables were 
more often positive than negative, and 
fairly high. 

The median correlation coefficients be- 
tween ratings based on each of four pro- 
jective techniques and FinP ratings are 
presented in Table 2. It was noted that 
correcting the technique-FinP 1’s for at- 
tenuation in the FinP ratings raised the 
median r only slightly, the greatest in- 
crease being from .31 to .35 in the case of 
the SC on Scale A. That is to say, that 
even had the FinP ratings been perfectly 
reliable, the ratings based on projective 
techniques would have correlated only 
slightly higher with the FinP ratings. 


2. Differences in the validities of ratings 
among the projective techniques. 


In order to test the hypothesis that 
there are no differences in the validities 
of ratings made on the basis of the four 
techniques used in this study the follow- 
ing analysis was made. The validity co- 
efficients for each of the techniques for 
each of the 15 variables were transformed 
into Fisher’s z function and analysis of 
variance was done for each of the two 
sets of variables. There was no rater vari- 
ance here since raters had been com- 
bined for each technique. The results of 
the analysis of variance for the trans- 
formed validity coefficients for tech- 


Technique 


n SC 


29 125 .26 


56 
II5 .28 102 -14 127 
.28 


| 
Scale 
Ror. n TAT Po n BG n 
Scale A .20 58 
Scale B 
Both Scales .27 -27 
7 


niques for Scale A showed no significant 
differences in validity at the .o5 level for 
the techniques or for the variables. The 
analysis of variance of the Scale B vari- 
ables showed no significant differences 
between variables but a difference be- 
tween techniques significant at the .o1 
level. Further analysis showed that this 
difference was attributable to the ratings 
based on the Bender-Gestalt; this tech- 
nique differed significantly from each of 
the other three techniques which did not 
differ significantly among themselves. 
Since the Bender-Gestalt raters were four 
different people from those whose ratings 
were based on the other three techniques, 
conclusions from these findings must be 
made with great care. 


3. Comparison of technique validities on 
Scales A and B 


When the median validity coefficients, 
which are given in Table 2, are ex- 
amined, one is struck by the fact that 
with the exception of the Rorschach the 
coefficients are somewhat higher for Scale 
A, which is not a result that might have 
been readily hypothesized beforehand. It 
may also be noted that while many of 
the median values are statistically signif- 
icant (.05 level) none of them may rea- 
sonably be regarded as predictively use- 
ful. Thus, for the techniques studied 
(with the exception of the Rorschach), 
we see that projective methods do not 
lead to more valid inferences about the 
more presumably covert aspects of per- 
sonality, as is frequently argued, than 
they do of the more obvious aspects of 
personality. On the other hand, and par- 
ticularly with reference to the TAT and 
the SC where the student was not seen 
by the rater (Ror and B-G were admin- 
istered by the rater) and behaviorial 
manifestations of the student were in- 


8 HENRY SAMUELS 


ferred from the protocol only, we find 
that ratings are as valid, by and large, 
for overt variables as for covert variable, 

The possibility that these findings may 
have been related to the unreliability of 
the criterion ratings has been shown to 
be dismissible. Another criticism of these 
findings may ask if it is fair to require 
clinicians (in this instance projective 
clinicians) to express their impressions 
and judgments in the form of ratings, 
The core of the argument is that clini. 
cians are accustomed to describing their 
findings in the form of language (inter. 
pretive findings beyond the scores, ratios, 
percentages, etc. required by the various 
test instruments) and not in the form of 
ratings. While this problem must, per- 
haps necessarily, remain largely a matter 
for individual judgment, there is con. 
siderable merit in the argument that 
numbers may be more precise than 
words, and are definitely far more useful 
in the verification of hypotheses. This 
position does not deny the possibility 
that ratings may not accurately represent 
the judgments about people which clini- 
cians can make from personality test 
data, nor does it deny the possibility that 
such judgments may be more accurately 
communicated in some form other than 
ratings. 

That there are not more differences in 
validities between the two scales for the 
projective techniques is a curious finding 
and difficult to understand. It is not un- 
reasonable to believe that if one has 
understanding of the dynamics of be- 
havior one can infer one step more to the 
overt behavior of a given individual, but 
in making the additional inferential step 
there is additional opportunity for error. 
It would not have been surprising to find 
that ratings of overt personality char 
acteristics had less relationship to the 


crit 
ver 
bot 
ter 
stu 
mu 
cri 
ob: 
jec 
wil 
wi 


te 
ne 
by 
of 


find 
large, 
ables, 
May 
ity of 
VN to 
these 
quire 
ective 
tings, 
clini- 
their 
inter: 
atios, 
ious 
rm of 
latter 

con- 

that 
than 
iseful 
This 
bility 
esent 
clini- 

test 
that 
ately 
than 


es in 
r the 
ding 
t un- 
has 
be- 
o the 
, but 
step 
Tor. 
find 
char- 
the 


criterion ratings than ratings of the co- 
vert traits. That they do not—and since in 
both cases the relationships with the cri- 
terion ratings are low—suggests further 
study to determine, for example, how 
much correlation there is between the 
criterion measures and ratings based on 
observations of the subject in the pro- 
jective testing situation, and how much 
with ratings based on projective data 
without direct observation of the subject. 


4. Correlations between projective 
technique ratings 


Up to this point the four projective 
techniques have been considered as ge- 
nerically related. They have been called 
by a class name and regarded as methods 
of evaluating personality characteristics. 


TABLE 3 
CoRRELATIONS BETWEEN RATINGS BASED ON EACH OF THE PROJECTIVE TECHNIQUES* 


PERSONALITY-TRAIT RATINGS BASED ON PROJECTIVE TECHNIQUES 9 


We have found that ratings based on 
each of the four techniques show but low 
correlation with the criterion measures. 
We shall turn to the question, to what 
degree are personality ratings based on 
each of the separate techniques in agree- 
ment? This was investigated by inter- 
correlating the ratings, variable by vari- 
able, for each of the four techniques. The 
results are shown in Table 3. While there 
is a tendency toward positive relation- 
ships (101 out of 120) among the inde- 
pendent ratings (no two ratings of a 
single subject on a given trait were made 
by the same projective technique rater), 
there is very little basis for a feeling of 
confidence that these instruments serve 
similar functions, i.e., that these instru- 
ment-clinician combinations are measur- 


Variable 


Scale A 


Ror/BG 


TAT/SC TAT/BG  SC/BG 


17 
18 -II 
21 —.20 . 22 


—.05 -14 
.09 ing —.O1 
-10 .06 +09 
.06 .05 .10 


03 


Scale B 


Ror/BG 


TAT/SC TAT/BG SC/BG 


23 .12 +04 
24 .02 .09 
25 —.14 -03 
26 -08 
27 II 


—.07 .04 +04 
— .04 .00 .06 —.02 
— .05 .02 —.14 
-14 +19 
.09 -14 +05 216 
+04 -13 


13 


—.05 


Median correlation for 


Scale A .02 .08 
Scale B .06 -06 
Both Scales 


.09 -10 -04 
— .02 +05 -06 +09 
-06 -06 -08 


61. For Scale B, N equals 128. 


* Correlation coefficients which are underlined are significant at the .os level. For Scale A, N equals 


Ror/TAT Ror/SC 
22 — 227 +46 = -04 
Ror/TAT —_Ror/SC 
28 +15 +02 

29 +05 —.10 -H -O1 
30 +05 —.09 -09 .09 .08 

31 +03 — .07 .06 234 +15 
Scale 


HENRY SAMUELS 


TABLE 4 


MEDIANS OF CORRELATIONS BETWEEN PROJECTIVE TECHNIQUE RATINGS AND FINAL* 
PooLep RATINGS FOR EACH oF S1x PROJECTIVE RATERS 


Rorschach 


Median correlation for 
Scale 


Scale A .26 11 II 12 -16 10 -43 
Scale B 18 23 -26 22 .44 22 .48 20 20 «CCG 
Both Scales .30 .21 .22 -44 .48 -30 .19 


TAT 
Median correlation for 
Scale 


W n F n All 


Scale A ag we 8 — 3 .30 14 
Scale B -26 22 -16 22 .22 14 «28 46 4 -23 25 . 28 
Both Scales .40 


SC 
Median correlation for 
Scale 


Scale A —.19 4 8 18 14 83 4 .52 13 — 
Scale B .04 16 “Ss 37 -45 26 .46 6 36 
Both Scales -04 -47 .30 -74 -47 — 


*Correlation coefficients which are underlined are significant at the .o5 level. 


ing the same thing. Although the propor- variable. In Table 4 may be found the 
tion of positive correlations is large, only median validity coefficients for all traits 
g correlations out of 120 are significant at in Scale A and Scale B respectively, and 
the .o5 level, where 6 correlations are ex- for both Scales together. These are the 
pected to reach this level by chance. median correlations for each of the six 
raters for ratings based on the Rorschach, 


TAT, and SC, respectively. Table 5 con- 


1. Relative validities of ratings for tains the same information for the Ben- 
individual raters der-Gestalt. 
Beyond the techniques per se we are The results obtained (of which only 


interested in the relative validities of the the medians are presented here) suggest 
individual raters by technique and by that two of the six Rorschach analysts 


TABLE 5 


MEDIANS OF CORRELATIONS BETWEEN BENDER-GESTALT RATINGS AND FINAL POOLED 
RATINGS FOR EACH OF Four BENDER-GESTALT RATERS 


Scale 


Scale B -14 32 .10 29 .22 33 .12 33 15 
Both Scales 


10 
th 
4 
T 
a 
i 
\ 
‘ 
| 
| 
Rater 
R n Q n M n P n All 


were able to make ratings which cor- 
related with the criterion ratings beyond 
chance expectancy at the .o5 level. For 
these two raters, F and T, the median 
coeficients for Scale B are, respectively, 
44 and .48. These are the only median 
coefficients which are statistically signifi- 
cant for the six raters on both scales. 
The number of statistically signifi- 
cant individual (not median) correla- 
tions is, considering both scales, greater 
than would be expected in chance occur- 
rence for any rater, since at the .os, level 
only one in twenty coefficients is ex- 
pected to reach this level by chance. The 
median value, however, is regarded as 
a better representation of the validity of 
a rater, in that it allows for the fact of 
intercorrelation between the traits within 
each Scale. 

While the evaluation of a correlation 
coefficient in terms of what may be called 
its social significance is quite subjective, 
there is little reason for enthusiasm to- 
ward the efficiency with which these six 
Rorschach analysts were able to predict 
the criterion ratings. Although the valid- 
ity coeficients for Rorschach raters may 
seem somewhat discouraging, it is em- 
phasized that the ratings of Rorschach 
projectivists F and T appear to be rea- 
sonably valid. This finding is congruent 
with the situation which exists in the 
area of projective testing in which, to a 
very considerable extent, adeptness in 
the use of projective methods is acquired 
through instruction by masters. In the 
practical situation of assessment, how- 
ever, four of the six Rorschach raters, 
initially selected on the basis of clinical 
competence, did not make as valid rat- 
ings as the other two. The high propor- 
tion of positive correlations suggests that, 
under assessment conditions, the Ror- 
schach method provides a basis for some 


PERSONALITY-TRAIT RATINGS BASED ON PROJECTIVE TECHNIQUES 


validity of rating. 
These findings and observations also 

apply, in general, to the data obtained 

for raters using the TAT, SC, and BG. 


C. RELATIVE CONTRIBUTIONS OF RATERS, 
TECHNIQUES, AND TRAITS TO VALIDITIES 
OF RATINGS 


In order to determine the relative con- 
tributions of each of the three sources of 
variance investigated in this study to the 
validities of ratings, the following analy- 
sis was carried out. The validity coefh- 
cients were regarded as scores, and, after 
being transformed into their respective z 
functions, were treated by the method of 
analysis of variance. Analyses were done 
separately for Scales A and B, and addi- 
tional analyses for the Bender-Gestalt. 

For Scale A traits the results indicated 
that significant differences in validity 
may be attributed both to techniques 
and to raters (.05 level) and that the in- 
teraction of techniques and raters makes 
for differences in validity (.01 level). 
That is to say, that for Scale A traits, the 
traits themselves do not contribute sig- 
nificantly to differences in validity. How- 
ever, the different techniques make for 
differences in the validity of ratings and 
the different raters make for differences 
in validity of ratings. We find differences 
in validity dependent upon who rates, 
and upon what technique is being used, 
and differences in validity for the com- 
binations of raters and techniques. | 

A similar analysis of Scale B traits 
showed significant differences in the 
validity of ratings for technique-rater 
combinations. Further analysis, however, 
showed this difference to be largely the 
contribution of rater differences in valid- 
ity of rating. As with the Scale A traits, 
the traits themselves are not a source of 
differences in validity of ratings. While 


11 : 

All = 
12 
36 — 
19 
All 
33 
28 
30 
All 
52 
36 
37 
; 
raits 
and 
the 
> SIX 
ach, 
con- : 
Ben. 
only 
rest 3 
lysts 

3 
— 

‘A 
: 


12 HENRY SAMUELS 


the techniques did contribute to differ- 
ences in rating validity for Scale A traits, 
they are not sources of difference for the 
Scale B traits. For the presumably covert 
aspects of personality it appears that dif- 
ferences in validity depend chiefly upon 
the rater. 

Separate analyses of variance for each 
of the two Scales for the Bender-Gestalt 
showed that neither variance due to 
raters nor variance due to traits was sig- 
nificantly greater than variance due to 
chance. Since only the one projective 
method was involved in these analyses, 
there was no technique variance. 

It seemed reasonable to expect that 
some personality traits would be rela- 


tively more validly rated than others by 
one or all of the projective techniques or 
by one or all of the projective raters. The 
absence of statistically significant differ. 
ences in validities of ratings attributable 
to differences among traits is necessarily 
qualified in that to the extent to which 
there is correlation among traits, it is 
more difficult to find existing differences. 
This is not a particularly cogent objec- 
tion in the case of the Scale A variables 
for which the intercorrelations were low. 
The Scale B traits do have, in some in- 
stance, sizable correlation between them, 
resulting in an underestimation of the 
significance of the between-variable vari- 
ance. 


] 
] 


N THIS chapter are presented data with 
I respect to factors, other than those 
originally considered in the scope of this 
study, which may have had an effect 
upon the validity of ratings based on 
projective techniques. Consideration will 
be given to the possibility of bias in rat- 
ing; to differences in dispersion (confi- 
dence) of ratings; to some additional fac- 
tors concerning raters; and to the possi- 
ble effects of unreliability in the predic- 
tor and criterion ratings. 


A. Bias IN RATING 


One factor which may be related to 
the validity of ratings based on projec- 
tive techniques is the frame of reference 
in which ratings were made. By frame of 
reference is meant an attitude of opti- 
mism represented by a tendency to make 
ratings in the direction of the “lauda- 
tory” or socially desirable end of the 
rating scale, It is possible that if ratings 
on a given technique tend, consistently, 
to be either more or less “laudatory” 
than the Final Pooled Ratings, that this 
attitude might be related to the validity 
of such ratings. Similarly, it is possible 
that such differences in attitude might be 
related to differences between the rela- 
tive validities of ratings among the tech- 
niques. In order to determine whether 
such “frame of reference” differences did 
obtain, the following was done. First, in 
order to compare the ratings by each of 
the techniques with the FinP ratings, 
the significance of the difference between 
the mean ratings was calculated. This 
was done by the usual method of obtain- 
ing the quotient of the difference be- 
tween means divided by the standard 
error of the difference. In each instance 


CHAPTER IV 
OTHER FACTORS RELATED TO VALIDITY 


the FinP means were subtracted from 
the technique means and the arithmetic 
sign recorded in order that the direction 
of difference be known. This procedure 
was followed for the four techniques for 
the fifteen traits studies. A comparison of 
the FinP means with those of the Projec- 
tive Integrator was necessarily limited to 
the ten Scale B traits. Second, the mean 
ratings by each of the techniques on the 
fifteen traits were compared with the 
mean ratings by each of the other projec- 
tive techniques, again by calculating the 
significance of each of the differences. 
The results of the comparison of mean 
ratings for techniques and FinP ratings 
are presented in Table 6 in the form of 
an abbreviated table of signs. Either a 
plus or a minus sign is entered in the 
table where a statistically significant (.o5 
level) difference between means occurs. 
The plus sign is used to indicate that the 
technique mean was in the “laudatory” 
direction, while the minus sign indicates 
that the FinP mean was in the “lauda- 
tory” direction. An examination of 
Table 6 reveals that the number of sta- 
tistically significant differences found ex- 
ceeds the number which might have 
been expected if only chance were in- 
volved but these differences are not con- 
sistently in the same direction. In the 
case of the Projective Integrator the ob- 
tained significant differences are always 
in the direction of being less “lauda- 
tory,” and for the Rorschach one may 
note a trend in the same direction.’ 
These data may be of interest insofar as 
? Soskin (5) analyzed ratings based only on stu- 
dent behavior in a series of standard situations 
and found that Situationists “see the sample of 


subjects are being characterized predominately 
by the condemnatory pole. .. .” 


by 

The 
| 
ible 
rily 
Lich 

t is 
ces. 
jec- | 
bles 
ow. 

in- q 
em, | 
the 
‘ari- 

13 


HENRY SAMUELS 


TABLE 6 


TABLE OF SIGNS SHOWING STATISTICALLY SIGNIFICANT DIFFERENCES BETWEEN 
MEAN TECHNIQUE RATINGS AND MEAN FINAL PooLep RATINGs* 


Variable Ror-FinP TAT-FinP Sce-FinP BG-FinP PI-FinP 


24 


4 # 
17 + + 
18 # 
21 # 
22 + # 
23 


* In this table the ‘‘+”’ sign indicates that there is a statistically significant difference (.05 level) 
with the technique mean in the “laudatory” direction. The reverse is true for the ‘‘—” sign. 
# The Projective Integrator did not rate Scale A variables. 


they characterize the PI and the Ror- significant validity coefficients were ob- 
schach as tending to look on the dark tained at least as frequently when there 
side of personality, but probably have was not a difference between means as 
little, if any, relation to validity. When when there was a mean difference. It may 
the differences shown in Table 6 were also be noted that statistically significant 
compared with the validity coefficients validity coefficients are not consistently 
which are statistically significant for the associated with differences either toward 
techniques and for the Projective In- or away from the “laudatory” end of the 
tegrator, it was apparent that statistically rating scale. 


TABLE 7 


TABLE OF STATISTICALLY SIGNIFICANT DIFFERENCES BETWEEN MEAN 
RaTInGs BASED ON EACH OF THE PROJECTIVE TECHNIQUES* 


Variable 


Ror-TAT Ror-SC Ror-BG TAT-SC TAT-BG SC-BG 


* In this table the “+” sign indicates that there is a statistically significant difference (.05 level) 
with the first of the two techniques at the head of the column in the “laudatory” direction. The re 
verse is true for the “‘—” sign. 


14 
25 
26 | 
27 
28 
29 
30 
31 
42 
4 
17 
18 
21 
22 
23 
24 4 
25 
26 + 4 + 
27 
28 
29 
30 
3 I 
42 


| | 


level) 


> ob- 
there 
ns as 
icant 
ently 
ward 
f the 


In Table 7 are presented the signs of 
statistically significant differences 
tween the mean ratings among the tech- 
niques. In this table the minus sign indi- 
cates that the first of the two technique 
abbreviations at the head of the column 
has a mean rating in a less “laudatory” 
direction. Thus, the minus sign for Trait 
#22 in the first column signifies that 


PERSONALITY-TRAIT RATINGS BASED ON PROJECTIVE TECHNIQUES 


significant validity coefficients than the 


TABLE 8 


15 


Bender-Gestalt without there being any 
differences between the means for these 
two techniques. Again, one might charac- 
terize the Rorschach raters as tending to 
view people darkly, while the Bender- 
Gestalt raters tend in the direction of rat- 
ing personality in a relatively benign 
fashion. 


TABLE OF STATISTICALLY SIGNIFICANT DIFFERENCES BETWEEN STANDARD DEVIATIONS 
OF TECHNIQUE RATINGS AND FINAL POOLED RaAtTINGs* 


Variable Ror-FinP TAT-FinP SC-FinP BG-FinP PI-FinP 
4 # 
17 # 
18 + 
21 
22 
23 + 


24 


+ 


* In this table the ‘‘+”’ sign indicates that there is a statistically significant difference (.o5 level) 
with the technique ratings having the larger standard deviation. The reverse is true for the ‘‘—” sign. 


there is a significant difference between 
the mean ratings of the Rorschach and 
TAT on this trait, and that the differ- 
ence is in the direction of the Rorschach 
being less “laudatory.” The apparent 
tendency for ratings based on the 
Bender-Gestalt to be more often in the 
“laudatory” direction than ratings based 
on the Rorschach or Sentence Comple- 
tion may suggest a relationship to differ- 
ences in validity, since both the Ror- 
schach and Sentence Completion yielded 
more and higher statistically significant 
validity coefficients than the. Bender- 
Gestalt. This argument is weakened, how- 
ever, by the observation that the TAT 
too yielded more and higher statistically 


# The Projective Integrator did not rate Scale A variables. 


B. DisPERSION (CONFIDENCE) OF RATINGS 


Another factor which needs to be eval- 
uated, since it might be related to valid- 
ity of ratings, is the spread of scores or 
ratings. When there is more spread, the 
likelihood of obtaining higher correla- 
tion coefficients is increased. If one is 
willing to assume that where a rater 
lacks confidence in his rating he will 
tend to rate closer to the mean than he 
will when he does feel confident, then we 
can use a measure of spread as an index 
of confidence in evaluating confidence in 
rating as a factor in validity. 

Table 8 is a table of signs showing 
where significant differences in spread 
(standard deviation) occur. A plus sign 


| 
+ | 
25 
r + + + + | 
27 
28 2 
20 
30 | 
42 = I 
| 
level) 2 
he re- | 


HENRY SAMUELS 


TABLE 9 


TABLE OF STATISTICALLY SIGNIFICANT DIFFERENCES BETWEEN STANDARD DEVIATIONS 
OF RATINGS ON EACH OF THE PROJECTIVE TECHNIQUES* 


Variable 


Ror-TAT Ror-SC 


Ror-BG 


TAT-SC TAT-BG SC-BG 


+ 


4. 


The reverse is true for the ‘‘—” sign. 


indicates that the spread for the tech- 
nique ratings is significantly greater than 
that for the FinP ratings, and vice versa 
for the minus sign. These data suggest 
that dispersion of ratings has little or 
no relation to validity. The ratings based 
on the Bender-Gestalt, which are the 
least valid of the four sets of ratings, 
most often show significantly greater 
spread than the FinP ratings whereas 
the Projective Integration ratings, which 
are more. valid for more traits than the 
ratings based on the Bender-Gestalt, are 
in no instance significantly different in 
spread from the FinP ratings. It is inter- 
esting to note that ratings based on each 
of the four projective techniques (but 
not the Projective Integration) have sig- 
nificantly larger standard deviations on 
Trait #25, “Characteristic Intensity of 
Inner Emotional Tension,” than the 
FinP. If this represents a feeling of con- 
fidence, it appears not to be justified con- 
sistently since the TAT and BG ratings 
on this trait did not correlate significantly 
with the criterion ratings. 

A comparison of the techniques with 


* In this table the “‘+”’ sign indicates that there is a statisticall 
with the first of the two techniques at the head of the column having the larger standard deviation 


y significant difference (.05 level) 


each other in terms of willingness to 
spread ratings was carried out. The re- 
sults are presented in Table 9. The data 
suggest that ratings based on the Ror- 
schach tend to be made with more spread 
than those based on the Sentence Com- 
pletion and TAT. Similarly, the Bender- 
Gestalt tends to inspire dispersion of 
ratings at least equal to the Rorschach. 
This may be a function of the fact that 
both the Ror and BG were administered 
by the rater; i.e., face to face contact may 
be related to dispersion of ratings (con- 
fidence). This would be consistent with 
the staff's opinion that the interview 
contributed most to their understanding 
of the case. 


C. ADDITIONAL FActrors CONCERNING 
RATERS 


It has been shown, (Chapter III, B, 2), 
that different raters contribute signifi- 
cantly to differences in validities of rat- 
ings. One possible explanation of this 
finding, that “good” raters may have 
been assigned to one technique and 
“poor” raters to another, is not too ten- 


16 
4 
17 
18 | 
21 
22 
23 _ 
24 + 
25 
26 
27 
28 
29 -- 
30 
42 | 


NG 


2), 
nifi- 
rat- 
this 
have 
and 
ten- 


able since all the projective raters used 
the same techniques (except for the Ben- 
der-Gestalt) and, in addition, cases were 
randomly assigned to raters. On the other 
hand the number of cases seen by each 
projective rater varied and the obtained 
difference in validities for raters may 
have been related to difference in size of 
samples. As a way of evaluating whether 
there are differences in validity associ- 
ated with the numbers of cases seen by 
“good” and “poor” raters, the rank-order 
positions of raters was used as an index 
of ability to rate. If we consider that 
those raters in the upper half of the rank- 
ing are “good” raters and those in the 
lower half are “poor’’ raters, then for 
Scale B, we find the following. For the 
Rorschach the “good’’ raters W, F, and 
T, saw 65 of a total of 125 cases; for the 
TAT the “good” raters, W, F, and T, 
saw 54 Of the 115 cases; for the SC the 
“good” raters, F, T, and G, saw 36 of 102 
cases. Also, the size of the median valid- 
ity coefficient for the SC compares favor- 
ably with those of the Rorschach and 
TAT. We may conclude that the size of 
sample for different raters is not signifi- 
cantly related to validity of rating. 

Another possibility to be considered is 
the effect of a given projective technique 
on a rater. There may be something in 
the nature of a given technique which 
effects a change in the clinician who uses 
it. The data in Table 6 suggest that the 
Rorschach is more often associated with 
a tendency to see people in a less benign 
light than are the other techniques. 
When the Rorschach rater in his role of 
Projective Integrator makes a new set 
of ratings, incorporating the results of 
the other projective techniques with his 
own Rorschach results, this tendency to 
see people in a less benign light is in- 
creased. 


PERSONALITY-TRAIT RATINGS BASED ON PROJECTIVE TECHNIQUES 


17 


D. FURTHER EvipENCE ON INTER-JUDGE 
AND INTER-TECHNIQUE AGREEMENT 


As was reported in Chapter III, A, 1, 
this study found that unreliability of cri- 
terion ratings has very little effect on the 
validity of ratings based on projective 
techniques. There is no way of knowing, 
from the present data, to what extent the 
low validities found are a function of the 
unreliability of raters. The question may 
be asked in two ways: to what degree do 
judges agree when they use the same 
technique, to what degree does a judge 
agree with himself using different tech- 
niques? In order to answer these ques- 
tions another study was carried out and 
some preliminary results were available 
at the time of writing. In this new study 
four projectivists made ratings based on 
each of four projective technique rec- 
ords from each of a total of 20 sub- 
jects. The number of subjects and raters 
was limited by practical considerations. 
On each of the 16 days of the experiment 
a rater made ratings based on five projec- 
tive-technique records from 5, different 
subjects. For example, on the first day 
Rater A would have worked with the 
Rorschach records of Subjects 1 and 16, 
the TAT of Subject 10, the SC of Subject 
12, and the B-G of Subject 4. This same 
“block” of records would be given to 
Rater B on the eighth day, to Rater C on 
the fifth day, and to Rater D on the six- 
teenth day. In this way each of the 16 
blocks of five projective-technique rec- 
ords would be worked with by each of 
the four raters, so that at the end of the 
16 days each rater would have rated all 
the projective records on all 20 subjects. 
Tables of random numbers were used to 
assign the projective technique records 
to the 16 “blocks,” after which the 
blocks were treated as units and distrib- 
uted by random numbers to the raters. 


3G : 
level) 5 
ation 
S$ to 
e re- 
data 
Ror- 
read 
om- 
1 of : 
ach. 
that 
ered : 
may 
con- 
with 
view 

ding 
= 


TABLE 10 


MEDIAN INTERCORRELATION COEFFICIENTS FOR 
EaAcu oF Four RATERS WHEN RATINGS BASED ON 
Four TECHNIQUES ARE INTERCORRELATED* 


Rater 
Variable 

B Cc D 

4 -32 -12 -I9 -06 

23 — .06 20 —.03 
25 —.I5 .06 .18 
31 .02 -32 -20 
42 .26 -16 -14 


* Correlation coefficients which are underlined 
are significant at the .os level. 


The ratings were then examined for con- 
sistency of rating for a rater over the 
four techniques, and for the consistency 
of ratings on a given technique by the 
four raters, in both cases for 20 subjects. 

In Table 10 are shown the results for 
the consistency of the individual raters, 
i.e., the intra-rater consistency over all 
four techniques. The data are presented 
in the form of median values for the 6 


intercorrelations of the four techniques. . 


The single correlation which appears to 
be significant (Rater C, Trait #31) may 
be meaningless in that at the .on level 1 
correlation in 20 may be expected to 
reach this level of significance. 

In Table 11 are shown the inter-rater 


i8 HENRY SAMUELS 


TABLE 11 


MEDIAN INTERCORRELATION COEFFICIENTS FoR 
Each oF Four TECHNIQUES WHEN RATINGs py 
Each RATER ON THAT TECHNIQUE ARE InTrR. 


CORRELATED* 
Technique 
Variable — 

Ror TAT sc BG 

4 .28 -49 -34 

18 
23 -35 .20 .36 .20 

25 .22 .40 20 

31 -58 -16 +51 +34 

42 256 +34 


* Correlation coefficients which are underlined 
are significant at the .o5 level. 


(intra-technique) agreements. These 
again are median correlations for the 6 
intercorrelations of the ratings of four 
raters for the same technique. It appears, 
from the data in these two tables, that a 
projective technique tends to generate 
more consistency in rating behavior than 
the use of a single projective rater. Stated 
another way, it would appear that differ- 
ent raters will tend to rate a subject in 
the same way if they use the same pro 
jective technique. A given rater will tend 
to rate the same subject in ‘a nonconsist- 
ent manner when he uses different tech- 
niques. 


ol 
ni 
d 
| d 
r 
V 
7 


CHAPTER V 


SUMMARY AND CONCLUSIONS 


A. OBJECTIVES 


HE principal objectives of this study 
were: 

1. To determine the relative validities 
of ratings based on four projective tech- 
niques when these were used to describe 
personality and to predict “over-all suita- 
bility” as a clinical psychologist; 

2. To determine whether there were 
differences in the relative validities of 
ratings based on projective techniques 
which were ascribable to the rater; 

3. To determine whether there were 
differences in the relative validities of 
ratings based on projective techniques 
which were ascribable to the technique; 

4. To determine whether there were 
differences in the relative validities of 
ratings based on projective techniques 
which were attributable to the personal- 
ity trait being rated. 

Other related problems were also in- 
vestigated. 


B. METHODS AND PROCEDURES 


During an Intensive assessment pro- 
gram, a total of 128 male first-year gradu- 
ate students in clinical psychology were 
rated by ten staff members on 15, per- 
sonality variables after analysis of proto- 
cols from each of four projective tech- 
niques, and again after all the projective 
material (except the actual ratings) from 
the four techniques had been examined 
by a “projective integrator.” These rat- 
ings were correlated with another set of 
ratings which were arrived at in a “pool- 
ing” conference of three staff members 
who had studied each subject intensively 
for a week. Since staff members had all 
the information it was possible to obtain 


about a subject during an assessment 
week, the ratings which the three staff 
members agreed upon in their final con- 
ference were taken as the best-rated de- 
scriptions of a subject’s personality and 
used as criterion measures. These cri- 
terion measures were “contaminated” in 
that they included the protocols and rat- 
ings of the predictors, the effect of which 
would be spuriously to raise the correla- 
tions between ratings based on projective 
techniques and criterion measures. Under 
these conditions the correlations between 
ratings based on projective techniques 
and the final staff ratings were regarded 
as validity indices. By comparing the cor- 
relation coefficients which were obtained 
in this manner, the objectives of this 
study could be achieved. 


C. SUMMARY OF FINDINGS 
1. The projective techniques. 


a. Ratings based on the projective 
techniques correlated significantly 
with the criterion measures more 
frequently than was expected by 
chance. These correlations were 
preponderantly positive, but were, 
by usual standards, low. 

. Correcting the criterion measures 
for attenuation raised the median 
correlations between ratings based 
on projective techniques and the 
criterion measures only slightly. 
The greatest increase was from .31 
to .35. 

. Ratings based on the Rorschach 
and the Bender-Gestalt (made by 
the raters who had administered 
these techniques) tended to be 
made with greater confidence, i.e., 


iS BY 
20 | 
20 
34 | 
48 : 
lined 
hese | 
ne 6 
four 
Pars, | 
ata 
rate | 
han 
ated | 
ffer- | 
pro- | 
end 
sist- a 
ech- 


HENRY 


showed greater spread, than ratings 
based on the Thematic Appercep- 
tion Test and the Sentence Com- 
pletion (which were rated 
“blind”). 

d. Ratings based on the Rorschach 
tended to be made away from the 
“laudatory” end of the ratings scale 
when compared with ratings based 
on the other projective techniques 
and with the criterion ratings. 

e. There appeared to be little or no 
relation between validity of ratings 
and bias of ratings toward either 
the “laudatory” or “derogatory” 
ends of the rating scale. 

f. There appeared to be little or no 
relation between validity of ratings 
and spread of ratings. 

g. Preliminary findings suggested that 
there was more correlation be- 
tween ratings made by different 
raters using the same projective 
technique, than between ratings 
made by a single rater using differ- 
ent projective techniques. 


2. The projective raters. 


a. Differences in the validity of rat- 
ings based on projective techniques 
were found which could be attri- 
buted to the individuals making 
the ratings. 

b. Significant differences in the valid- 
ity of ratings obtained not only be- 
tween raters, but for the inter- 
action between rater and _ tech- 
nique, for both scales and for the 
Rorschach, the Thematic Apper- 
ception Test, and the Sentence 
Completion; interaction could not 


SAMUELS 


be tested for the Bender-Gestalt. 

c. Although some raters were “better” 
than others, in no instance did the 
median validity coefficients of sta. 
tistical significance exceed r = .48, 


3. The Traits. 


No significant differences were 
found in the validity of ratings 
which were attributable to the dif- 
ferences in personality traits rated. 


D. CONCLUSIONS 


The conclusions which are drawn 
from the findings of this study are: 


1. 


Projective techniques, used in the 
assessment of personality character- 
istics, measure very little in com- 
mon. 


. There are significant individual dif- 


ferences in the ability to make valid 
ratings of personality traits from 
projective techniques, which appear 
to be independent of the technique 
used. 


. The dispersion of ratings made by 


clinicians on personality traits on 
the basis of projective techniques 
does not appear to be related to the 
validity of ratings. 


. The assessment of the degree to 


which a subject possesses socially 
desirable personality traits is, in 
part, a function of the projective 
technique which is used in making 
the assessment. 


. The value of projective techniques 


as instruments for the assessment of 
specified personality traits in 4 
group of normal superior adults is 
apparently limited by low validities. 


20 
1. 
2. 
3. 


rawn 


1 the 
acter- 
com- 


dif- 
valid 
from 


rique 


le by 
is on 
iques 
o the 


to 
cially 
s, in 
~ctive 


aking 


iques 
nt of 
in a 
Its is 
lities. 


BIBLIOGRAPHY 


1, CaTrELL, R. B. Description and measurement 
of personality. Yonkers-on-Hudson: World 
Book Co., 1946. 

Fiske, D. W. Consistency of factorial structures 
of personality ratings from different sources. 
J. abnorm. soc. Psychol. 1949, 44, 329-344- 

Ketry, E, L, Research on the selection of 
clinical psychologists. J. clin. Psychol. 1947, 
3, 39°42. 

4. Ketty, E. L. & Fiske, D. W. The prediction 
of success in the VA training program in 
clinical psychology. Amer, Psychologist, 
1950, 5, 395-406. 


5. SAMUELS, H. An analysis of some factors af- 
fecting ratings of personality traits based on 
projective techniques. Ph.D. Thesis, Univer. 
Mich., 1950. 

6. Soskin, W. F. A study of personality ratings 
based upon brief observations of behavior 
in standard situations. Microfilm Abstr. 
(Ph.D. Thesis) University Microfilms, Ann 
Arbor, Mich. Publ. No. 1208. 

7. Tupes, E. C. An evaluation of personality trait 
ratings obtained by unstructured assessment 
interviews. Psychol. Monogr. 1950, 64, No. 
11 (Whole No. 317). 


It. 

tter” 

1 the | 
 sta- 
48, 
were | 
tings 
dif- 
ated. 

21 


